Goto

Collaborating Authors

 differential privacy


Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning

arXiv.org Machine Learning

We study stochastic decision-theoretic online learning with full information and event-level pure differential privacy. A COLT open problem of Hu and Mehta asks to determine the optimal gap-dependent regret rate for stochastic decision-theoretic online learning under pure event-level differential privacy. For $K$ actions, losses in $[0,1]$, and a unique best action separated from the second-best action by gap $ฮ”_{\min}$, the known lower bound is of order $ \frac{\log K}{\min\{ฮ”_{\min},\varepsilon\}}, $ or equivalently, up to universal constants, of order \[ \frac{\log K}{ฮ”_{\min}}+\frac{\log K}{\varepsilon}. \] We give a horizon-free pure-DP algorithm and prove the explicit regret bound \[ \operatorname{Reg}_T \le 1000 \cdot \left(\frac{\log K}{ฮ”_{\min}}+\frac{\log K}{\varepsilon}\right) \] for every horizon $T$. The numerical constant is not optimized. The algorithm partitions time into blocks of exponentially increasing size, plays a single action throughout each block, and chooses the next action by an exponential mechanism applied to a data-independent random prefix of the previous block. The random prefix converts block regret into a sum, over all prefix lengths, of softmax selection errors. A single entropy-potential argument controls all privacy-dominated large-gap actions at cost $\log K/\varepsilon$.


From Privacy to Generalization: Linear Max-Information Bounds for DP-SGD

arXiv.org Machine Learning

Understanding the relationship between generalization and privacy remains a central challenge in modern machine learning theory, particularly for deep networks trained by variants of differentially private stochastic gradient descent (DP-SGD). In this work we make progress on this persistent open problem by proving a finite-sample bound on the approximate max-information of DP-SGD that exhibits scaling properties comparable with (Dwork et al, 2015)'s classic result for $ฮต$-differentially private algorithms, namely at most linear in the dataset size. From our result we obtain a general-purpose PAC-Bayes generalization bound in which the necessary prior distribution can be learned by DP-SGD, as well as a generalization bound for DP-SGD-trained models themselves, with a complexity term that is fully explicit and controlled by the optimization hyperparameters.


Modulated learning for private and distributed regression with just a single sample per client device

arXiv.org Machine Learning

This work focuses on the question of learning from a large number of devices with each device holding only a single sample of data. Several real-world applications exist to this one sample per client setup up including learning from fitness trackers, data/app usage aggregators, body-worn sensing devices, and daily event monitors to name a few. When a client has only one sample, the standard federated learning paradigm breaks down as a local update based on that single point is far from being useful, especially in the earlier rounds for estimation of the model coefficients. This utility is further weakened by the privacy-inducing noise applied at every round. This work caters to this problem to enable such clients to collaboratively contribute to effectively learn a global model without leaking the privacy of their data. The proposed approach injects a single, carefully calibrated noisy perturbation to transform the sample at each client, followed by a post-processed representation which is shared with the server. These representations aggregated at the server are processed to obtain an unbiased gradient update that in expectation matches the non-private centralized gradient while preserving data privacy. This approach is different than traditional private federated learning, where the communication payloads involve model coefficients as opposed to privately transformed data samples. This method enables devices with extremely limited data to collaborate and learn accurate, privacy-preserving models without requiring large local datasets or sacrificing individual privacy.


Private Adaptive Covariance Estimation via Gaussian Graphical Models

arXiv.org Machine Learning

We propose PACE-GGM, a data-adaptive differentially private method for covariance estimation that concentrates its privacy budget on the most informative entries of the empirical covariance matrix, rather than perturbing all entries. This applies in the natural setting where the modeler supplies separate bounds for each variable, so that individual entries can be measured with less noise than the full matrix. In each round, our method selects a poorly approximated entry, measures it using the Gaussian mechanism, and then reconstructs a full covariance matrix using a maximum-entropy reconstruction objective, leading to a Gaussian graphical model structure. Experiments on diverse real-world datasets demonstrate consistent improvements in estimation error with respect to the Gaussian mechanism and other baselines, particularly in high-dimensional and low-to-moderate privacy regimes.


On the Stability of Spherical Hellinger-Kantorovich Flows and Their Implications for Differential Privacy

arXiv.org Machine Learning

We consider the problem of sampling from an unnormalized Boltzmann/ Gibbs density, ฯ€(ฮธ) exp V(ฮธ),ฮธ ฮ˜ Rd, where the normalization constant is unknown (and/or intractable) and only the potential function V (and typically its derivatives) can be evaluated. This problem arises across various domains in Bayesian inference, statistical physics, and modern machine learning. A common variational perspective on sampling is to characterize the target distribution ฯ€ as the unique minimizer of a functional (typically a divergence functional) over the space of probability measures. From this viewpoint, sampling can be formulated as evolving an initial distribution ฯ0 toward ฯ€ via the gradient flow of this functional under a suitable geometric structure on the space of probability measures. In this paper, we focus on a gradient flow based sampling methodology built from the spherical Hellinger Kantorovich (SHK), also known as the Wasserstein Fisher Rao (WFR), geometry on the space of probability measures (Kondratyev and Vorotnikov, 2019; Liero et al., 2018; Chizat et al., 2015). When the variational objective is the exclusive KL divergence ฯ 7 KL(ฯ ฯ€), the SHK gradient flow generates a time-indexed family of marginals {ฯt}t 0 (initialized at ฯ0 P2(ฮ˜)) that evolves according to the continuity reaction equation (4). This evolution is equivalent to the birth-death Langevin dynamics introduced in Lu et al. (2019) .


Node-private community estimation in stochastic block models: Tractable algorithms and lower bounds

arXiv.org Machine Learning

We study the classical problem of community recovery in stochastic block models with a fixed number of communities, with a twist: We seek algorithms that are stable with respect to node-wise changes in the graph structure, formally defined as a differential privacy constraint. The algorithms we develop are based on spectral clustering, where we introduce privacy to the community recovery pipeline in the form of directly privatizing the adjacency matrix; private PCA; private convex optimization; private low-rank matrix estimation; and private approximate subspace estimation. Straightforward applications of existing private algorithms lead to a rapid increase in the privacy parameter $ฮต$ in order to ensure consistent estimation under node differential privacy, in contrast with the simpler setting of edge privacy. To alleviate these issues, we develop novel algorithms based on (1) sampling from an exponential mechanism with a Lipschitz extension and (2) a general framework for constructing smooth projections from the space of undirected graphs to the space of bounded-degree graphs, which can then be combined with various edge-private algorithms. Importantly, the methods we develop are all computable in polynomial-time as a function of the number of nodes in the graph. We also develop novel lower bounds on the growth rate of $ฮต$ required in order to achieve consistent community estimation under node privacy. On a technical note, our paper highlights the complications that arise when analyzing private algorithms under the non-standard scaling $ฮต\rightarrow \infty$ and proposes some solutions. We also provide a novel application of the HGR maximal correlation from information theory in the context of accuracy amplification in PAC learning, which may be of independent interest.


The Privacy Price of Tail-Risk Learning: Effective Tail Sample Size in Differentially Private CVaR Optimization

arXiv.org Machine Learning

Differential privacy changes the effective sample size governing CVaR learning. For tail mass $ฯ„$, the privacy-relevant sample size is not $n$, but $nฯ„$; equivalently, the effective private tail sample size is $ฮตnฯ„$. Private CVaR excess risk decomposes into ordinary tail-risk statistical error and a privacy price. This decomposition is complete for scalar estimation and finite classes: scalar estimation has rate $ฮ˜(B \min\{1,(nฯ„)^{-1/2}+(ฮตnฯ„)^{-1}\})$, and finite classes of size $M$ have rate $ฮ˜(B \min\{1,\sqrt{\log(2M)/(nฯ„)}+\log(2M)/(ฮตnฯ„)\})$. These complete rates hold under pure DP, and their lower bounds extend to approximate DP in the stated small-$ฮด$ regimes. For convex Lipschitz learning, modular upper and lower reductions show that the CVaR-specific privacy term necessarily scales as $1/(ฮตnฯ„)$, with dimension dependence inherited from private stochastic convex optimization. Together, these results identify ordinary private learning on $ฮ˜(nฯ„)$ informative tail records as the canonical hard subproblem inside private CVaR learning.


Differentially Private Sampling from Distributions via Wasserstein Projection

arXiv.org Machine Learning

In this paper, we study the problem of sampling from a distribution under the constraint of differential privacy (DP). Prior works measure the utility of DP sampling with density ratio-based measures such as KL divergence. However, such formulations suffer from two key limitations: 1) they fail to capture the geometric structure of the support, and 2) they are not applicable when the supports of the distributions differ. To deal with these issues, we develop a novel framework for DP sampling with Wasserstein distance as the utility measure. In this formulation, we propose Wasserstein Projection Mechanism (WPM), a minimax optimal mechanism based on Wasserstein projection. Furthermore, we develop efficient algorithms for computing the proposed mechanisms approximately and provide convergence guarantees.


Integrating Feature Correlation in Differential Privacy with Applications in DP-ERM

arXiv.org Machine Learning

Standard differential privacy imposes uniform privacy constraints across all features, overlooking the inherent distinction between sensitive and insensitive features in practice. In this paper, we introduce a relaxed definition of differential privacy that accounts for such heterogeneity, allowing certain features to be treated as insensitive even when correlated with sensitive ones. We propose a correlation-aware framework, $\textsf{CorrDP}$, which relaxes privacy for insensitive features while accounting for their correlations with sensitive features, with the correlations quantified using total variation distance. We design algorithms for differentially private empirical risk minimization (DP-ERM) under the $\textsf{CorrDP}$ framework, incorporating distance-dependent noise into gradients for improved theoretical utility guarantees. When the correlation distance is unknown, we estimate it from the dataset and show that it achieves a comparable privacy-utility guarantee. We perform experiments on synthetic and real-world datasets and show that $\textsf{CorrDP}$-based DP-ERM algorithms consistently outperform the standard DP framework in the presence of insensitive features.